graph : use F32 accumulators for gpt-oss #15312

ggerganov · 2025-08-14T13:13:36Z

Request F32 accumulators for the attention output multiplication. This is similar to the existing hint for GLM models in llm_graph_context::build_attn():

llama.cpp/src/llama-graph.cpp

Lines 1474 to 1482 in 810b9fc

    
           if (wo) { 
        
               cur = build_lora_mm(wo, cur); 
        
               if (arch == LLM_ARCH_GLM4 || arch == LLM_ARCH_GLM4_MOE) { 
        
                   // GLM4 and GLM4_MOE seem to have numerical issues with half-precision accumulators 
        
                   ggml_mul_mat_set_prec(cur, GGML_PREC_F32); 
        
               } 
        
           }

Here we add the same hint for the new llm_graph_context::built_attn_with_sinks():

llama.cpp/src/llama-graph.h

Lines 734 to 738 in 810b9fc

    
           // TODO: temporary to keep the diff small. after the code is public will refactor to simplify this 
        
           ggml_tensor * build_attn_with_sinks( 
        
                   llm_graph_input_attn_kv_unified_iswa * inp, 
        
                   ggml_tensor * wo,

This build_attn_with_sinks() path currently exists temporary and will eventually be merged in the llm_graph_context::build_attn().

ggml-ci

graph : use F32 accumulators for gpt-oss

220860a

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

graph : use F32 accumulators for gpt-oss #15312

graph : use F32 accumulators for gpt-oss #15312

ggerganov commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!


	if (wo) {
	cur = build_lora_mm(wo, cur);
	if (arch == LLM_ARCH_GLM4 \|\| arch == LLM_ARCH_GLM4_MOE) {
	// GLM4 and GLM4_MOE seem to have numerical issues with half-precision accumulators
	ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
	}
	}


	// TODO: temporary to keep the diff small. after the code is public will refactor to simplify this
	ggml_tensor * build_attn_with_sinks(
	llm_graph_input_attn_kv_unified_iswa * inp,
	ggml_tensor * wo,

graph : use F32 accumulators for gpt-oss #15312

Are you sure you want to change the base?

graph : use F32 accumulators for gpt-oss #15312

Conversation

ggerganov commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Aug 14, 2025 •

edited

Loading